{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"intro-to-langkit\"></a>\n",
    "## 📖 Intro to LangKit\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/langkit/blob/main/langkit/examples/Intro_to_Langkit.ipynb)\n",
    "\n",
    "\n",
    "\n",
    "Table of Contents\n",
    "- [Intro to LangKit](#intro-to-langkit)\n",
    "- [What is LangKit](#what-is-langkit)\n",
    "- [Initialize LLM metrics](#initialize-llm-metrics)\n",
    "- [Hello, World!](#hello-world)\n",
    "- [Comparing Data](#comparing-data)\n",
    "- [Monitor Metrics over time](#monitor-metrics-over-time)\n",
    "- [Next Steps](#next-steps)\n",
    "\n",
    "<a id=\"what-is-langkit\"></a>\n",
    "### 📚 What is LangKit?\n",
    ">LangKit is an open-source text metrics toolkit for monitoring language models. It offers an array of methods for extracting relevant signals from the input and/or output text, which are compatible with the open-source data logging library [whylogs](https://whylogs.readthedocs.io/en/latest).\n",
    ">In this example, we'll look for distribution drift in the sentiment scores on the model response.💡\n",
    "\n",
    "Ok! let's install __langkit__."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Note: you may need to restart the kernel to use updated packages.\n",
    "%pip install -U langkit[all]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"initialize-llm-metrics\"></a>\n",
    "### 🚀 Initialize LLM metrics\n",
    "LangKit provides a toolkit of metrics for LLM applications, lets initialize them!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langkit import llm_metrics # alternatively use 'light_metrics'\n",
    "import whylogs as why\n",
    "\n",
    "why.init()\n",
    "# Note: llm_metrics.init() downloads models so this is slow first time.\n",
    "schema = llm_metrics.init()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"hello-world\"></a>\n",
    "### 👋 Hello, World!\n",
    "In the below code we log a few example prompt/response pairs and send metrics to WhyLabs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There are 50 records in this toy example data, here's the first one:\n",
      "prompt: Hello, response: World!\n",
      "\n",
      "✅ Aggregated 50 rows into profile 'langkit-sample-chats-all'\n",
      "\n",
      "Visualize and explore this profile with one-click\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔍 https://hub.whylabsapp.com/resources/model-1/profiles?sessionToken=session-8gcsnbVy&profile=ref-AVbB1yOaSblsa89U\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "from langkit.whylogs.samples import load_chats, show_first_chat\n",
    "\n",
    "# Let's look at what's in this toy example:\n",
    "chats = load_chats()\n",
    "print(f\"There are {len(chats)} records in this toy example data, here's the first one:\")\n",
    "show_first_chat(chats)\n",
    "\n",
    "results = why.log(chats, name=\"langkit-sample-chats-all\", schema=schema)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"comparing-data\"></a>\n",
    "### 🔍 Comparing Data\n",
    "Things get more interesting when you can compare two sets of metrics from an LLM application. The power of gathering systematic telemetry over time comes from being able to see how these metrics change, or how two sets of profiles compare. Below we asked GPT for some positive words and then asked for negative words as part of these two toy examples.\n",
    "\n",
    "> 💡 Take a look at the difference in the `response.sentiment_nltk` distributions between the two profiles. The first example is much more positive than the second example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "prompt: What do you think about puppies? response: Puppies are absolutely adorable. Their playful nature and boundless energy can bring a lot of joy and happiness.\n",
      "\n",
      "prompt: Can you describe a difficult day? response: A difficult day might be filled with challenging tasks, stressful situations, and unexpected obstacles. :-( These moments can feel overwhelming and can lead to feelings of frustration!\n",
      "\n",
      "✅ Aggregated 14 lines into profile 'positive_chats', 20 lines into profile 'negative_chats'\n",
      "\n",
      "Visualize and explore the profiles with one-click\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "🔍 https://hub.whylabsapp.com/resources/model-1/profiles?sessionToken=session-8gcsnbVy&profile=ref-ONdpltPp9jno6qMM&profile=ref-HoQEiEKtaaAvGG7S\n",
      "\n",
      "Or view each profile individually\n",
      " ⤷ https://hub.whylabsapp.com/resources/model-1/profiles?session-8gcsnbVy&profile=ref-ONdpltPp9jno6qMM\n",
      " ⤷ https://hub.whylabsapp.com/resources/model-1/profiles?session-8gcsnbVy&profile=ref-HoQEiEKtaaAvGG7S\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "pos_chats = load_chats(\"pos\")\n",
    "show_first_chat(pos_chats)\n",
    "\n",
    "neg_chats = load_chats(\"neg\")\n",
    "show_first_chat(neg_chats)\n",
    "\n",
    "results_comparison = why.log(multiple={\"positive_chats\": pos_chats,\n",
    "                                       \"negative_chats\": neg_chats},\n",
    "                            schema=schema)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example we are using anonymous guest sessions to view the collected metrics in WhyLabs, You can explore and compare specific metrics, in this example we expect a large and obvious distribution drift in the sentiment scores on the response which you can see [here](https://hub.whylabsapp.com/resources/model-1/profiles?feature-highlight=response.sentiment_nltk&includeType=discrete&includeType=non-discrete&limit=30&offset=0&sessionToken=session-8gcsnbVy&sortModelBy=LatestAlert&sortModelDirection=DESC&profile=ref-ONdpltPp9jno6qMM&profile=ref-HoQEiEKtaaAvGG7S).\n"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"monitor-metrics-over-time\"></a>\n",
    "### 📈 Monitor Metrics over time\n",
    "\n",
    "\n",
    "WhyLabs hosts a public demo org where we show a test LLM application being monitored over time. With LangKit telemetry collected over time, you can detect changes systematically and be alerted to potential problems in a deployed LLM application. We think a variety of different metrics can be helpful in detecting anamolies. In this example, something has triggered a series of alerts. Can you guess what went wrong? \n",
    "\n",
    "##### 🎯 **Explore a demo environment** [here](https://hub.whylabsapp.com/resources/demo-llm-chatbot/columns/prompt.sentiment_nltk?dateRange=2023-06-08-to-2023-06-09&sortModelBy=LatestAlert&sortModelDirection=DESC&targetOrgId=demo&sessionToken=session-8gcsnbVy)!\n",
    "\n",
    "<a id=\"next-steps\"></a>\n",
    "### ✅ Next Steps\n",
    "If you see value in detecting changes in how your LLM application is behaving, you might take a look at some of our other examples showing how to monitor these metrics as a timeseries for an LLM application in production, or how to customize the metrics logged by using your own surrogate models or critic metrics.\n",
    "* Check out the [examples](https://github.com/whylabs/langkit/tree/main/langkit/examples) folder for scenarios from [\"Hello World!\"](https://github.com/whylabs/langkit/blob/main/langkit/examples/Logging_Text.ipynb) to [monitoring an LLM](https://github.com/whylabs/langkit/blob/main/langkit/examples/LLM_to_WhyLabs.ipynb) in production!\n",
    "* Learn more about the [features](https://github.com/whylabs/langkit#features-%EF%B8%8F) LangKit extracts out of the box.\n",
    "* Learn more about LangKit's [modules documentation](https://github.com/whylabs/langkit/blob/main/langkit/docs/modules.md).\n",
    "* Explore more on [WhyLabs](https://whylabs.ai/safeguard-large-language-models?utm_source=github&utm_medium=referral&utm_campaign=langkit) and monitor your LLM application over time!\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3.11.2 64-bit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "orig_nbformat": 4,
  "vscode": {
   "interpreter": {
    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
